31 research outputs found
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
Given an untrimmed video and natural language query, video sentence grounding
aims to localize the target temporal moment in the video. Existing methods
mainly tackle this task by matching and aligning semantics of the descriptive
sentence and video segments on a single temporal resolution, while neglecting
the temporal consistency of video content in different resolutions. In this
work, we propose a novel multi-resolution temporal video sentence grounding
network: MRTNet, which consists of a multi-modal feature encoder, a
Multi-Resolution Temporal (MRT) module, and a predictor module. MRT module is
an encoder-decoder network, and output features in the decoder part are in
conjunction with Transformers to predict the final start and end timestamps.
Particularly, our MRT module is hot-pluggable, which means it can be seamlessly
incorporated into any anchor-free models. Besides, we utilize a hybrid loss to
supervise cross-modal features in MRT module for more accurate grounding in
three scales: frame-level, clip-level and sequence-level. Extensive experiments
on three prevalent datasets have shown the effectiveness of MRTNet.Comment: work in progres
Dual Preference Distribution Learning for Item Recommendation
Recommender systems can automatically recommend users with items that they
probably like. The goal of them is to model the user-item interaction by
effectively representing the users and items. Existing methods have primarily
learned the user's preferences and item's features with vectorized embeddings,
and modeled the user's general preferences to items by the interaction of them.
In fact, users have their specific preferences to item attributes and different
preferences are usually related. Therefore, exploring the fine-grained
preferences as well as modeling the relationships among user's different
preferences could improve the recommendation performance. Toward this end, we
propose a dual preference distribution learning framework (DUPLE), which aims
to jointly learn a general preference distribution and a specific preference
distribution for a given user, where the former corresponds to the user's
general preference to items and the latter refers to the user's specific
preference to item attributes. Notably, the mean vector of each Gaussian
distribution can capture the user's preferences, and the covariance matrix can
learn their relationship. Moreover, we can summarize a preferred attribute
profile for each user, depicting his/her preferred item attributes. We then can
provide the explanation for each recommended item by checking the overlap
between its attributes and the user's preferred attribute profile. Extensive
quantitative and qualitative experiments on six public datasets demonstrate the
effectiveness and explainability of the DUPLE method.Comment: 23 pages, 7 figures. This manuscript has been accepted by ACM
Transactions on Information System
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation
Multi-modal recommendation systems, which integrate diverse types of
information, have gained widespread attention in recent years. However,
compared to traditional collaborative filtering-based multi-modal
recommendation systems, research on multi-modal sequential recommendation is
still in its nascent stages. Unlike traditional sequential recommendation
models that solely rely on item identifier (ID) information and focus on
network structure design, multi-modal recommendation models need to emphasize
item representation learning and the fusion of heterogeneous data sources. This
paper investigates the impact of item representation learning on downstream
recommendation tasks and examines the disparities in information fusion at
different stages. Empirical experiments are conducted to demonstrate the need
to design a framework suitable for collaborative learning and fusion of diverse
information. Based on this, we propose a new model-agnostic framework for
multi-modal sequential recommendation tasks, called Online
Distillation-enhanced Multi-modal Transformer (ODMT), to enhance feature
interaction and mutual learning among multi-source input (ID, text, and image),
while avoiding conflicts among different features during training, thereby
improving recommendation accuracy. To be specific, we first introduce an
ID-aware Multi-modal Transformer module in the item representation learning
stage to facilitate information interaction among different features. Secondly,
we employ an online distillation training strategy in the prediction
optimization stage to make multi-source data learn from each other and improve
prediction robustness. Experimental results on a video content recommendation
dataset and three e-commerce recommendation datasets demonstrate the
effectiveness of the proposed two modules, which is approximately 10%
improvement in performance compared to baseline models.Comment: 11 pages, 7 figure
Target-Guided Composed Image Retrieval
Composed image retrieval (CIR) is a new and flexible image retrieval
paradigm, which can retrieve the target image for a multimodal query, including
a reference image and its corresponding modification text. Although existing
efforts have achieved compelling success, they overlook the conflict
relationship modeling between the reference image and the modification text for
improving the multimodal query composition and the adaptive matching degree
modeling for promoting the ranking of the candidate images that could present
different levels of matching degrees with the given query. To address these two
limitations, in this work, we propose a Target-Guided Composed Image Retrieval
network (TG-CIR). In particular, TG-CIR first extracts the unified global and
local attribute features for the reference/target image and the modification
text with the contrastive language-image pre-training model (CLIP) as the
backbone, where an orthogonal regularization is introduced to promote the
independence among the attribute features. Then TG-CIR designs a target-query
relationship-guided multimodal query composition module, comprising a
target-free student composition branch and a target-based teacher composition
branch, where the target-query relationship is injected into the teacher branch
for guiding the conflict relationship modeling of the student branch. Last,
apart from the conventional batch-based classification loss, TG-CIR
additionally introduces a batch-based target similarity-guided matching degree
regularization to promote the metric learning process. Extensive experiments on
three benchmark datasets demonstrate the superiority of our proposed method
General Debiasing for Multimodal Sentiment Analysis
Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal
information for prediction yet unavoidably suffers from fitting the spurious
correlations between multimodal features and sentiment labels. For example, if
most videos with a blue background have positive labels in a dataset, the model
will rely on such correlations for prediction, while ``blue background'' is not
a sentiment-related feature. To address this problem, we define a general
debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD)
generalization ability of MSA models by reducing their reliance on spurious
correlations. To this end, we propose a general debiasing framework based on
Inverse Probability Weighting (IPW), which adaptively assigns small weights to
the samples with larger bias i.e., the severer spurious correlations). The key
to this debiasing framework is to estimate the bias of each sample, which is
achieved by two steps: 1) disentangling the robust features and biased features
in each modality, and 2) utilizing the biased features to estimate the bias.
Finally, we employ IPW to reduce the effects of large-biased samples,
facilitating robust feature learning for sentiment prediction. To examine the
model's generalization ability, we keep the original testing sets on two
benchmarks and additionally construct multiple unimodal and multimodal OOD
testing sets. The empirical results demonstrate the superior generalization
ability of our proposed framework. We have released the code and data to
facilitate the reproduction
Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction
Automatic bundle construction is a crucial prerequisite step in various
bundle-aware online services. Previous approaches are mostly designed to model
the bundling strategy of existing bundles. However, it is hard to acquire
large-scale well-curated bundle dataset, especially for those platforms that
have not offered bundle services before. Even for platforms with mature bundle
services, there are still many items that are included in few or even zero
bundles, which give rise to sparsity and cold-start challenges in the bundle
construction models. To tackle these issues, we target at leveraging multimodal
features, item-level user feedback signals, and the bundle composition
information, to achieve a comprehensive formulation of bundle construction.
Nevertheless, such formulation poses two new technical challenges: 1) how to
learn effective representations by optimally unifying multiple features, and 2)
how to address the problems of modality missing, noise, and sparsity problems
induced by the incomplete query bundles. In this work, to address these
technical challenges, we propose a Contrastive Learning-enhanced Hierarchical
Encoder method (CLHE). Specifically, we use self-attention modules to combine
the multimodal and multi-item features, and then leverage both item- and
bundle-level contrastive learning to enhance the representation learning, thus
to counter the modality missing, noise, and sparsity problems. Extensive
experiments on four datasets in two application domains demonstrate that our
method outperforms a list of SOTA methods. The code and dataset are available
at https://github.com/Xiaohao-Liu/CLHE
Federated Class-Incremental Learning with Prompting
As Web technology continues to develop, it has become increasingly common to
use data stored on different clients. At the same time, federated learning has
received widespread attention due to its ability to protect data privacy when
let models learn from data which is distributed across various clients.
However, most existing works assume that the client's data are fixed. In
real-world scenarios, such an assumption is most likely not true as data may be
continuously generated and new classes may also appear. To this end, we focus
on the practical and challenging federated class-incremental learning (FCIL)
problem. For FCIL, the local and global models may suffer from catastrophic
forgetting on old classes caused by the arrival of new classes and the data
distributions of clients are non-independent and identically distributed
(non-iid).
In this paper, we propose a novel method called Federated Class-Incremental
Learning with PrompTing (FCILPT). Given the privacy and limited memory, FCILPT
does not use a rehearsal-based buffer to keep exemplars of old data. We choose
to use prompts to ease the catastrophic forgetting of the old classes.
Specifically, we encode the task-relevant and task-irrelevant knowledge into
prompts, preserving the old and new knowledge of the local clients and solving
the problem of catastrophic forgetting. We first sort the task information in
the prompt pool in the local clients to align the task information on different
clients before global aggregation. It ensures that the same task's knowledge
are fully integrated, solving the problem of non-iid caused by the lack of
classes among different clients in the same incremental task. Experiments on
CIFAR-100, Mini-ImageNet, and Tiny-ImageNet demonstrate that FCILPT achieves
significant accuracy improvements over the state-of-the-art methods
HDAC8 Inhibition Specifically Targets Inv(16) Acute Myeloid Leukemic Stem Cells by Restoring p53 Acetylation
SummaryAcute myeloid leukemia (AML) is driven and sustained by leukemia stem cells (LSCs) with unlimited self-renewal capacity and resistance to chemotherapy. Mutation in the TP53 tumor suppressor is relatively rare in de novo AML; however, p53 can be regulated through post-translational mechanisms. Here, we show that p53 activity is inhibited in inv(16)+ AML LSCs via interactions with the CBFβ-SMMHC (CM) fusion protein and histone deacetylase 8 (HDAC8). HDAC8 aberrantly deacetylates p53 and promotes LSC transformation and maintenance. HDAC8 deficiency or inhibition using HDAC8-selective inhibitors (HDAC8i) effectively restores p53 acetylation and activity. Importantly, HDAC8 inhibition induces apoptosis in inv(16)+ AML CD34+ cells, while sparing the normal hematopoietic stem cells. Furthermore, in vivo HDAC8i administration profoundly diminishes AML propagation and abrogates leukemia-initiating capacity of both murine and patient-derived LSCs. This study elucidates an HDAC8-mediated p53-inactivating mechanism promoting LSC activity and highlights HDAC8 inhibition as a promising approach to selectively target inv(16)+ LSCs
Exploring potential genes and mechanisms linking erectile dysfunction and depression
BackgroundThe clinical correlation between erectile dysfunction (ED) and depression has been revealed in cumulative studies. However, the evidence of shared mechanisms between them was insufficient. This study aimed to explore common transcriptomic alterations associated with ED and depression.Materials and methodsThe gene sets associated with ED and depression were collected from the Gene Expression Omnibus (GEO) database. Comparative analysis was conducted to obtain common genes. Using R software and other appropriate tools, we conducted a range of analyses, including function enrichment, interactive network creation, gene cluster analysis, and transcriptional and post-transcriptional signature profiling. Candidate hub crosslinks between ED and depression were selected after external validation and molecular experiments. Furthermore, subpopulation location and disease association of hub genes were explored.ResultsA total of 85 common genes were identified between ED and depression. These genes strongly correlate with cell adhesion, redox homeostasis, reactive oxygen species metabolic process, and neuronal cell body. An interactive network consisting of 80 proteins and 216 interactions was thereby developed. Analysis of the proteomic signature of common genes highlighted eight major shared genes: CLDN5, COL7A1, LDHA, MAP2K2, RETSAT, SEMA3A, TAGLN, and TBC1D1. These genes were involved in blood vessel morphogenesis and muscle cell activity. A subsequent transcription factor (TF)–miRNA network showed 47 TFs and 88 miRNAs relevant to shared genes. Finally, CLDN5 and TBC1D1 were well-validated and identified as the hub crosslinks between ED and depression. These genes had specific subpopulation locations in the corpus cavernosum and brain tissue, respectively.ConclusionOur study is the first to investigate common transcriptomic alterations and the shared biological roles of ED and depression. The findings of this study provide insights into the referential molecular mechanisms underlying the co-existence between depression and ED